Structural Classification of Parallel Computers

🔗💻🔀

Understanding Tightly Coupled and Loosely Coupled Systems

A detailed exploration of parallel computer architectures based on their structural design

⬇️

Introduction to Structural Classification 📚

Flynn's taxonomy focuses on the behavioral aspects of parallel computers and does not consider their structural design. However, parallel computers can also be classified based on their architecture.

🔍 Key Concepts

🧩

Parallel Computers

Systems with multiple processors working together

🔗

Interconnection Network

Links processors and memory modules

📊

Structural Classification

Based on how processors and memory are organized

🏗️ Two Main Architectures

🔗
Tightly Coupled

Shared memory systems

📦
Loosely Coupled

Distributed memory systems

Flynn's Taxonomy vs. Structural Classification 📊

🔍 Flynn's Taxonomy

Flynn's classification focuses on the behavioral aspects of parallel computers:

📝
SISD

Single Instruction, Single Data

📝➕📝
SIMD

Single Instruction, Multiple Data

📝➕📊
MISD

Multiple Instruction, Single Data

📝➕📊
MIMD

Multiple Instruction, Multiple Data

🏗️ Structural Classification

Structural classification, on the other hand, focuses on the physical organization of the system:

🔗
Tightly Coupled

Processors share global memory

📦
Loosely Coupled

Each processor has local memory

🔄 MIMD Systems

A parallel computer (MIMD) consists of multiple processors and shared memory modules or local memories connected via an interconnection network.

P1
Interconnection Network
P2
Memory

Tightly Coupled Systems / Shared Memory Systems 🔗

🔍 Definition

In tightly coupled systems, multiple processors communicate through a shared global memory. This organization is called a shared memory computer or tightly coupled system.

✅ Characteristics

🔄

Shared Memory

Every processor communicates through a shared global memory

High Throughput

Preferable for high-speed real-time processing

🔗

Tight Coupling

Processors are closely connected through shared memory

🏗️ Organization

In tightly coupled system organization, multiple processors share a global main memory, which may have many modules. The processors also have access to I/O devices.

P1
P2
Interconnection Network
Memory Module 1
Memory Module 2

🔌 Interconnection

The inter-communication between processors, memory, and other devices are implemented through various interconnection networks.

Interconnection Networks in Tightly Coupled Systems 🌐

🔍 Types of Interconnection Networks

🧠
PMIN

Processor-Memory Interconnection Network

📡
IOPIN

Input-Output-Processor Interconnection Network

⚠️
ISIN

Interrupt Signal Interconnection Network

🧠 Processor-Memory Interconnection Network (PMIN)

This switch links up various processors with different memory units.

🔗

Direct Connection

Connecting each processor directly to each memory module

🔀

Crossbar Switch

Can become very complex with many processors and memories

🪜

Multi-step Network

Used instead of complex crossbar switches

⚠️

Conflict Resolution

Handles clashes when processors access same memory modules

📡 Input-Output-Processor Interconnection Network (IOPIN)

This interconnection network is used for communication between processors and input/output (I/O) channels.

📝

Permission Control

Processors need permission from IOPIN to interact with I/O devices

🔌

I/O Channel Access

Manages which processor can access which I/O channel

⚠️ Interrupt Signal Interconnection Network (ISIN)

When one processor desires to interrupt another processor, the interruption first travels to the ISIN.

🔄

Interruption Relay

ISIN passes the interruption to the destination processor

⏱️

Synchronization

Allows ISIN to synchronize processors by facilitating interruptions

Failure Notification

If a processor fails, ISIN broadcasts a message to other processors

📊 ISIN Functions

🔄

Intermediary

Acts as intermediary for interruptions between processors

🎯

Coordination

Coordinates and relays interruptions

📢

Notification

Notifies all processors of any processor malfunction

Cache Memory in Tightly Coupled Systems 🗄️

⏱️ Reducing Delay

Since every reference to the memory in tightly coupled systems is via interconnection network, there is a delay in executing the instructions. To reduce this delay, every processor may use cache memory for the frequent references made by the processor.

🏗️ Organization with Cache

P1
Cache 1
P2
Cache 2
Interconnection Network
Shared Memory

💡 Benefits of Cache

Faster Access

Cache memory is faster than main memory

🔄

Reduced Network Traffic

Frequent accesses are handled locally

📈

Improved Performance

Overall system performance is enhanced

Modes of Tightly Coupled Systems 🔀

The shared memory multiprocessor systems can further be divided into three modes which are based on the manner in which shared memory is accessed.

🔄
UMA

Uniform Memory Access

🔀
NUMA

Non-Uniform Memory Access

🗄️
COMA

Cache-Only Memory Access

📊 Comparison of Modes

Mode Memory Access Key Characteristic
UMA Uniform for all processors All processors have equal access time
NUMA Non-uniform Local memory access is faster than remote
COMA Non-uniform Uses cache memories instead of local memories

Uniform Memory Access Model (UMA) 🔄

🔍 Definition

In this model, main memory is uniformly shared by all processors in multiprocessor systems and each processor has equal access time to shared memory.

🏗️ Structure

P1
P2
P3
Shared Bus
Shared Memory

✅ Characteristics

⏱️

Equal Access Time

All processors have same access time to memory

👥

Multi-user Environment

Used for time-sharing applications

🔄

Uniform Sharing

Memory is uniformly shared by all processors

💡 Example

Symmetric Multiprocessors (SMPs) are common examples of UMA systems where all processors have equal access to all memory locations.

Non-Uniform Memory Access Model (NUMA) 🔀

🔍 Definition

In shared memory multiprocessor systems, local memories can be connected with every processor. The collections of all local memories form the global memory being shared. In this way, global memory is distributed to all the processors.

⏱️ Access Time Variation

Local Memory Access

Uniform and fast for its corresponding processor

🔗

Remote Memory Access

Slower and non-uniform, depends on location

🏗️ Structure

P1
Local Memory 1
P2
Local Memory 2
Interconnection Network

💡 Key Point

In NUMA, all memory words are not accessed uniformly. The access time depends on the location of the memory relative to the processor.

🖥️ Real-world Example

Modern server systems with multiple CPU sockets, where each CPU has its own local memory but can access the memory of other CPUs through an interconnection network.

Cache-Only Memory Access Model (COMA) 🗄️

🔍 Definition

In NUMA model, if we use cache memories instead of local memories, then it becomes COMA model. The collection of cache memories forms a global memory space.

🏗️ Structure

P1
Cache 1
P2
Cache 2
Interconnection Network

⏱️ Access Characteristics

🔍

Cache as Memory

Cache memories act as the main memory

🔀

Non-uniform Access

Remote cache access is also non-uniform

🌐

Global Memory Space

Collection of all caches forms global memory

💡 Advantage

COMA can provide better performance for applications with high data locality, as data can be moved to where it is needed.

Loosely Coupled Systems / Distributed Memory Systems 📦

🔍 Definition

In loosely coupled systems, processors do not share global memory as shared memory leads to memory conflict issues, which slow down instruction execution.

🏗️ Organization

Each processor has a large local memory that is not shared with other processors. These systems have multiple processors with their own local memory and I/O devices, forming individual computer systems.

P1
Local Memory 1
P2
Local Memory 2
Message Passing Network

📡 Communication

They are connected via a message passing interconnection network through which processes communicate by exchanging messages.

🏷️ Alternative Names

🖥️

Distributed Multicomputer Systems

Each node has separate memory

🔗

Loosely Coupled Systems

Little interdependence between nodes

No-Remote Memory Access (NORMA) Systems 🚫

🔍 Definition

Since local memories can only be accessed by their attached processor, no processor is able to access remote memory. For this reason, these systems are also referred to as no-remote memory access (NORMA) systems.

🚫 Key Characteristic

🔒

Local Access Only

Processors can only access their own local memory

🚫

No Remote Access

Cannot directly access memory of other processors

📡 Communication Method

Communication between processors is achieved through message passing rather than shared memory access.

🏗️ Structure

P1
Local Memory 1
P2
Local Memory 2
Message Passing Network

Message Passing Interconnection Network 📨

🔗 Connection

The message passing interconnection network connects every node and communication between nodes with messages is dependent on the type of interconnection network.

🔍 Types of Interconnection Networks

🚌

Shared Bus

For non-hierarchical systems

🔀

Crossbar Switch

Direct connection between all nodes

🪜

Multistage Network

Multiple stages of switching elements

🔗

Mesh/Torus

Grid-like connection patterns

📨 Message Passing Process

1

Processor prepares message

✉️

2

Message is sent through interconnection network

📡

3

Message is received by destination processor

📥

4

Destination processor processes the message

⚙️

⚖️ Advantages and Disadvantages

Scalability

Easier to scale to large number of processors

No Memory Conflicts

Each processor has its own memory

Communication Overhead

Message passing adds latency

Complex Programming

Requires explicit message passing

Real-world Examples 💻

🔗 Tightly Coupled Systems Examples

🖥️

Symmetric Multiprocessors (SMPs)

Multiple processors share a common memory and I/O system (e.g., multi-core processors in desktop computers)

🖥️

Uniform Memory Access (UMA) Systems

All processors have equal access time to memory (e.g., early Sun Enterprise servers)

🖥️

Non-Uniform Memory Access (NUMA) Systems

Modern server systems with multiple CPU sockets (e.g., AMD EPYC, Intel Xeon servers)

📦 Loosely Coupled Systems Examples

🌐

Computer Clusters

Collection of computers connected via a network (e.g., Beowulf clusters)

☁️

Grid Computing

Distributed computing across multiple administrative domains (e.g., CERN's Large Hadron Collider computing grid)

🌐

Distributed Systems

Systems where components located on different networked computers communicate by passing messages (e.g., Google's search infrastructure)

🔀 Comparison in Practice

System Type Best Use Case Real-world Example
Tightly Coupled (UMA) General-purpose computing, shared memory applications Desktop/workstation with multi-core CPU
Tightly Coupled (NUMA) High-performance computing, large database servers Enterprise server with multiple CPU sockets
Loosely Coupled Highly scalable applications, fault-tolerant systems Supercomputer clusters, cloud computing infrastructure

Conclusion 🏁

🔍 Key Takeaways

🔗

Tightly Coupled Systems

Processors share global memory, faster communication but potential conflicts

📦

Loosely Coupled Systems

Each processor has local memory, no conflicts but communication overhead

🔌

Interconnection Networks

Crucial for performance in both architectures

📊 Structural Classification vs. Flynn's Taxonomy

While Flynn's taxonomy classifies parallel computers based on instruction and data streams, structural classification focuses on the physical organization of processors and memory. Both are important for understanding parallel computer architectures.

🚀 Future Trends

Modern systems often combine elements of both tightly and loosely coupled architectures, creating hybrid systems that leverage the advantages of each approach. For example, a cluster of NUMA systems forms a loosely coupled system with each node being a tightly coupled system.

💡 Final Thought

The choice between tightly coupled and loosely coupled systems depends on the specific requirements of the application, including performance needs, scalability requirements, and programming complexity. Understanding the structural classification helps in designing and selecting the appropriate parallel computing architecture for a given problem.